Combining the flexibility of speech synt pre-recorded audio: a comparison of two ap
نویسنده
چکیده
Many applications of TTS incorporate both unpredictable words, which require the flexibility of TTS, and static phrases, for which the quality of recorded speech is unmatched by TTS. “Phrase-splicing” TTS attempts to provide the optimal combination of the two, by customizing concatenative TTS to such applications by incorporating application-specific recordings at the word or phrase level while resorting to smaller-unit synthesis to fill the gaps not covered by those recordings. In the past, we have achieved this by using a word-level search on the application-specific recordings followed by a generalpurpose TTS search, in our case using sub-phonetic units, to fill the gaps. However, recent trends toward larger-unit roles in general-purpose TTS suggest a single-search approach for phrase splicing. A listening test shows that we achieve at least as high quality with the new one-search algorithm as with twosearch.
منابع مشابه
Exploring EFL Learners’ Use of Formulaic Sequences in Pragmatically Focused Role-play Tasks
Communicative language use largely entails regular patterns consisting of pre-constructed phrases or sequences. These sequences have been examined by many researchers to find the situation-based formulas which may help L2 learners follow a possibly more target-like speaking system. This study, therefore, explored two categories of formulaic expressions including speech formulas and situation-bo...
متن کاملCombining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)
Commercial quadcopters with many private, commercial, and public sector applications are a rapidly advancing technology. Currently, there is no guarantee to facilitate the safe operation of these devices in the community. Three different automatic commercial quadcopters identification methods are presented in this paper. Among these three techniques, two are based on deep neural networks in whi...
متن کاملComparing the Impact of Audio-Visual Input Enhancement on Collocation Learning in Traditional and Mobile Learning Contexts
: This study investigated the impact of audio-visual input enhancement teaching techniques on improving English as Foreign Language (EFL) learnersˈ collocation learning as well as their accuracy concerning collocation use in narrative writing. In addition, it compared the impact and efficiency of audio-visual input enhancement in two learning contexts, namely traditional and mo...
متن کاملSpeech synthesis by structured segments, using temporal decomposition and a glottal excitation
Classical speech synthesis systems either concatenate diphone-like tabulated pattems or reconstmct speech parameters according to pre-defmed mles. Both techniques show drawbacks : the fonner lacks flexibility while the lauer is highly time-consuming_ to built. We propose an intennediate technique using structured segments : segmental units are still resorted to, but they are automatically analy...
متن کاملThe Efficacy of Audio Input Flooding Tasks on Learning Grammar: Uptake of Present Tense
This study sought to probe the role of input flooding through listening tasks on the uptake of simple present tense and the present progressive tense among pre - intermediate English a s Foreign Language ( EFL ) learners. To comply with the objective, an experimental design was adopted. 55 pre - intermediate learners participated in the study. They were randomly divided into one control group, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005